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Description 

BACKGROUN D nF THE INVENTION 
1. Field of the Invention 

[0001] The present invention relates to a synchroniz- 
ing system which reproduces a video signal and an au- 
dio signal synchronously in a system that transfers cod- 
ed video and audio signals in time-division multiplexing, to 

Description of Backgroun d Information 

[0002] As a method of recording, reproducing or 
transferring compressed and coded video and audio 
signals and other data in time-division multiplexing, 
there is MPEG (Motion Picture coding Experts Group) 
which conforms to ISO 11172. 

[0003] The compressive coding of video signals in this 
scheme employs predictive coding in combination with 
motion compensation, and discrete cosine transform 

[0004] The method described in the ISO 11172 re- 
quires that a counter having many bits should be pro- 
vided in the reproducing apparatus, and that decoding 
timing should be so controlled as to start the presenta- 
tion of decoded data as a video image or voices and 
sound when the value of the counter coincides with the 
presentation time stamp (PTS). The control circuit there- 
fore becomes complicated. 

[0005] In EP-A-0,598,295 there is described a video 
and audio signal multiplexing and separating apparatus 
including a buffer memory for time axis adjustment of 
the audio signal. 

[0006] The number of audio signals in a buffer mem- 
ory is kept constant by detection of samples different 
from a specified value. This allows effective use of the 
buffer memory capacity. 

[0007] In Signal Processing Image Communication, 
vol 4, No.2, April 92, pp 153-159, there is described a 
specification for MPEG systems coding for compressed 
digital video and audio signals. 

SI IMMARY QF THE INVENTION 

[0008] It is therefore an object of the present invention 
to provide a synchronizing system with a simple struc- 
ture, which can accomplish synchronous reproduction 
without complicating a control circuit for synchronizing 
video and audio signals with each other. 
[0009] To achieve the above object, according to one 
aspect of this invention, there is provided a method of 
transmitting timendivided video and audio signals, com- 
prising the steps of; 

coding a predetermined time slot (GOP) of video 

signals to form a stream of video data: 

coding a predetermined number of samples of au- 



dio signals to form a unit audio data block and form- 
ing a stream of audio data consisting of an integer 
number of unit audio data blocks whose aggregate 
time slot approximately corresponds to said prede- 
termined time slot; 

performing time-division multiplexing on said 
stream of video data (GOP) and said stream of au- 
dio data, storing resultant data in a pack (VD,AP) 
having said predetermined time slot (GOP) and 
transferring the video signals and audio signals in 
pack periods each comprising a stream of predeter- 
mined number of said packs (VRAP) ; 

wherein the total time slot of the video signals in 
is a pack (UP.AP) period is equal to the total time slot of 
the audio signals in the pack period, and the number of 
audio data blocks in the packs of a period is varied cy- 
clically so that the presentational start times for the 
stream of video data and the stream of audio data in a 
20 pack in the pack period depends on the position of the 
pack in the pack period and is constant for a pack in that 
position from one pack period to another, and affixing to 
the pack positional information giving the position of the 
pack in the pack period. 
25 [001 0] According to another aspect of this invention, 
there is provided a method of reproducing time-divided 
video and audio signals, comprising a method of repro- 
ducing time-divided video and audio signals transmitted 
by the method of claim 1 , comprising a step of referring 
30 to positional information of a pack in a pack period, con- 
trolling at least one of the presentation start times for 
the video signals and the audio signals for each pack in 
the pack period so that the difference between the pres- 
entation start times for the video signals and audio sig- 
35 nals corresponds to the difference between presenta- 
tion start times denoted by the position of the pack in 
the pack period. 

BRIEF DESCRIPTION OF THF DRAWINGS 
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Fig. 1 is a diagram showing the directions of predic- 
tion between frames of video signals in the com- 
pressive coding which conforms to ISO 11172; 
Fig. 2 is a diagram showing the transmission state 
of a video stream which conforms to ISO 11172; 
Fig. 3 is a diagram exemplifying the multiplexing of 
various kinds of data that is specified by the system 
part of MPEG which conforms to ISO 11172; 
Fig. 4 is a diagram for explaining various time 
stamps and reference time information, showing 
how a stream of multiplexed packs in Fig. 3 are re- 
produced; 

Fig. 5 is a diagram showing a data format in a meth- 
od of recording compressed and coded data; 
Fig. 6 is a schematic block diagram of an encoder 
which accomplishes a method of making the 
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amount of data of GOP (Group Of Pictures) con- 
stant and which is employed in one embodiment of 
the present invention; 

Fig. 7 is a time chart for explaining the operation of 
a buffer memory of the encoder shown in Fig. 6; 
Fig. 8 is a diagram showing the format of a pack in 
a transmission method according to one embodi- 
ment of the present invention and the time-sequen- 
tial relation between actual video information and 
audio information in that pack; 
Fig. 9 is a diagram showing the relation among an 
AAU (Audio Access Unit) sequence number, the 
number of AAUs in one pack, and the difference be- 
tween presentation start times for video signals and 
audio signals in a reproducing apparatus; and 
Fig. 10 is a block diagram of a reproducing appara- 
tus which accomplishes a reproducing method ac- 
cording another embodiment of the present inven- 
tion. 

DETAILED DESCRIPTION OF THE PREFERRED 
EMBODIMENTS 

[0012] Before discussing a preferred embodiment of 
the present invention, the conventional compressive 
coding method will be described referring to the accom- 
panying drawings. 

[0013] An image coded by the MPEG scheme con- 
sists of an I picture (Intra coded picture) coded within a 
frame, a P picture (predictive coded picture) obtained 
by coding the difference between the current image and 
an old picture (decoded image of an I or P picture) and 
a B picture (Bidirectional ly predictive coded picture) ob- 
tained by coding the difference between the current im- 
age and an interpolated image which is predicted bidi- 
rectionally from old and future images. The predictive 
directions are illustrated in Fig. 1. 
[0014] Referring to Fig. 1, coded frame images are 
symbolized as parallelograms frame by frame. Those 
frame images correspond to consecutive frames of input 
video signals, and "I", "P" and "B" affixed to the frame 
images indicate the aforementioned types of pictures of 
the frame images. The arrowheads indicate the direc- 
tions of prediction between frames. 
[0015] A certain video sequence unit is collectively 
called "GOP" (Group Of Pictures). As one example, 15 
frames are treated as this unit in Fig. 1 and are sequen- 
tially given frame numbers. 

[0016] The compression efficiency in this coding var- 
ies with the difference in the coding scheme of the indi- 
vidual picture types. The compression efficiency is the 
highest for B pictures, the next highest compression ef- 
ficiency for P pictures and the lowest compression effi- 
ciency for I pictures. After compression, therefore, the I 
picture has the largest amount of data, the P picture has 
the next largest amount of data, and the B picture has 
the smallest amount of data. The amounts of each frame 
and each GOP are not be constant and differ depending 



on video information to be transmitted. 
[0017] While the order of uncompressed frames are 
as shown in Fig. 1, the order of compressed frames at 
the time of transmission becomes as shown in Fig. 2 for 
s the purpose of reducing the delay time in the decoding 
process. 

[001 8] Portions (a) and (b) in Fig. 2 conceptually illus- 
trate each coded frame image in view of the amount of 
data after compression, and the picture types I, P and 

io b and frame numbers correspond to those shown in Fig. 
1 . The coded video signals are arranged in the order of 
frame numbers as illustrated, and a sequence header 
SQH can be affixed to ensure independent reproduction 
GOP by GOP as shown in a portion (c) in Fig. 2. The 

is sequence header, which is located at least at the head 
of a stream of data or a video stream as shown in the 
portion (b) in Fig. 2, describes information about the en- 
tire video stream. The sequence header may be affixed 
to the head of every GOP to ensure reproduction of data 

20 from a middle part of each GOP, and includes initial data 
needed for the decoding process, such as the size of an 
image and the ratio of the vertical pixels to the horizontal 
pixels. A video stream to be transferred to a decoder is 
formed in the above manner. 

25 [001 9] The system part of the MPEG further specifies 
a scheme of multiplexing a compressed audio stream 
and a stream of other data in addition to the aforemen- 
tioned compressed video stream and accomplishing the 
synchronized reproduction of those streams. 

30 [0020] Fig. 3 exemplifies the multiplexing of various 
kinds of data, which is specified by the system part of 
the MPEG. 

[0021] In Fig. 3, a portion (a) indicates a data stream 
of coded video signals consecutively arranged in the or- 

35 der of GOPs as indicated by the portion (c) in Fig. 2, i. 
e., a video stream, and a portion (b) indicates a data 
stream of audio signals that are compressed and coded 
by a predetermined coding scheme which will not be dis- 
cussed in detail. Partial data of each stream is stored in 

40 a packet together with a packet header located at the 
head of the packet. A packet in which video stream data 
is stored is called a video packet (VP), and a packet in 
which audio stream data is stored is called an audio 
packet (AP). Likewise, a packet in which a stream of da- 

45 ta other than video and audio signals, such as control 
data, is stored is called a data packet (DP) though not 
illustrated. 

[0022] Some of those packets are grouped as a pack 
with a pack header placed at the head of this pack. The 

50 packets are transm itted pack by pack in the form shown 
in a portion (c) in Fig. 3. In the packet transmission, the 
pack header serves as a system header (SH) which de- 
scribes information about the whole pack stream and 
includes a pack start code PS and a system clock ref- 

55 erence SCR that indicates the reference of time. The 
packet header includes a presentation time stamp PTS 
and a decoding time stamp DTS as needed. A pack is 
the collection of individual partial streams each corre- 
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sponding to a packet. 

[0023] SCR in the pack header is the number of sys- 
tem clocks of 90 KHz counted from some point of time, 
and is used as a reference of time in reproducing the 
associated pack. PTS in the packet header represents 
the time at which the presentation of the packet contain- 
ing that PTS as a video image or sound and voices 
starts, by the number of the system clocks counted. DTS 
represents the time at which decoding of the packet con- 
taining that DTS starts. For B pictures in a video packet 
and an audio packet, the time data of PTS equals that 
of DTS so that DTS need not particularly be described. 
For I and P pictures in a video packet, since the pres- 
entation starting time lags from the decoding starting 
time due to the rearrangement of the frames in the op- 
posite order to the one shown in Fig. 2, PTS and DTS 
should be inserted as needed. PTS or a combination of 
PTS and DTS is inserted in a stream of video and audio 
packets at an interval of 0.7 sec or below. 
[0024] In reproducing such a stream of packs, the val- 
ue of SCR is loaded into a counter in a reproducing ap- 
paratus and thereafter the counter starts counting the 
system clock and is used as a clock. When PTS or DTS 
is present, each packet is decoded at the timing at which 
the presentation of the packet as a video image or sound 
and voices starts when the value of the counter coin- 
cides with PTS. With no PTS and DTS present, each 
packet is decoded following the decoding of the previ- 
ous packet of the same kind. 

[0025] The above will be conceptually explained be- 
low. Suppose that SCR of a pack 1 has been input at 
time t1 1 based on the system clock indicated in a portion 
(b) in Fig. 4 which illustrates the reproduction state of 
the stream of packs that are denoted by the same 
shapes and reference numerals as used in Fig. 3. Time 
data T1 1 is described in the SCR. As video stream data 
whose presentation starts from timetl 2 is stored in data 
DATA 11 in the first packet in the pack 1 , time data t12 
is described in PTS of that packet. As audio stream data 
whose presentation starts from time t1 3 is stored in data 
DATA1 3 in the third packet in the pack 1 , time data t1 3 
is described in PTS of that packet. As the end portion of 
GOP1 whose presentation starts from time t15 and the 
head portion of subsequent GOP2 are stored in data 
DATA1 4 in the fourth packet in the pack 1 , time data t1 5 
is described in PTS of that packet. For the subsequent 
packets, SCR and PTS are described in the same man- 
ner. A portion (c) in Fig. 4 shows presented video signals 
and a portion (d) presented audio signals. Although no 
PTS is described in the header of that packet which 
stores packet data DATA12, such a description is un- 
necessary as long as PTS is inserted at the aforemen- 
tioned interval of 0.7 sec or below. Assuming that GOP 
has the structure shown in Fig. 2, then the packet data 
DATA11 is stored from the data of the first I picture of 
the GOP1 , so that a value equivalent to the time earlier 
by three frames than PTS is described in DTS in the 
packet header of the DATA11 . 



[0026] As mentioned earlier, the method described in 
the ISO 11172 has shortcomings such that a counter 
having many bits should be provided in the reproducing 
apparatus, and that decoding timing should be so con- 

5 trolled as to start the presentation of decoded data as a 
video image or sound and voices when the value of the 
counter coincides with the presentation time stamp 
(PTS), thus complicating the control circuit. 
[0027] The present invention will now be described in 

io detail referring to the accompanying drawings. 

[0028] First, the length of one pack is set equal to the 
time slot for one GOP (e.g. , 1 5 frames) of video signals, 
which are stored in a compressed form in video packets 
in one pack. 

is [0029] This is accomplished by the following method. 
[0030] Fig. 5 is a diagram showing a data format in a 
method of recording compressed and coded data ac- 
cording to such a method. 

[0031] In Fig. 5, the amount of data of video signals 

20 after compression as shown in (b) in Fig. 2 differs frame 
by frame but should always be constant in one GOP. A 
scheme for making the amount of data in a single GOP 
constant will be discussed later. A portion (a) in Fig. 5 
shows the amount of data for each frame in a GOP, with 

25 the vertical scale representing the amount of data and 
the horizontal scale representing frames 31 to 14B. The 
data of the GOP is stored as a video packet VP in a pack 
together with an audio packet AP and a data packet DP 
as indicated in a portion (b) in Fig. 5. The size of one 

30 logical block (portion (c) in Fig. 5) of a predetermined 
recording medium on which such packets are recorded 
is 2048 bytes, and one pack has a size of 2048 x 1 44 
bytes (144 logical blocks). In one pack, the system 
header SH including the pack start code PS and system 

35 clock reference SCR, and a data packet DP occupy 
2048 X 1 2 bytes, an audio packet AP occupies 2048 X 
8 bytes and four video packets VP occupy 2048 X 124 
bytes. 

[0032] The upper limit of the bit rate for audio and vid- 
40 eo signals after compression become as follows. 

audio: 2048 X 8 X 8/0.5005 = 261.88 (Kbps) 



video: 2048 X 124 X 8/0.5005 = 4.059 (Mbps) 

[0033] The above bit rates are sufficient to transmit 
two channels of high-quality audio signals and video sig- 

50 nals having a high image quality. To provide four chan- 
nels of audio signals, the size of the data packet DP 
should be changed to 2048 X 4 bytes and an audio 
packet of 2048 X 8 bytes should be added so that each 
pack contains two systems of audio signals. 

55 [0034] The number of physical blocks (portion (d) in 
Fig. 5) of the predetermined recording medium varies 
depending on the error correcting system, particularly, 
the property of a burst error and the size of redundancy 
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allowed by the error correction code in the recording and 
reproducing system for the recording medium. For in- 
stance, when one physical block has a size of 2 16 = 
65536 bytes, one pack has four and half physical blocks, 
and when one physical block has a size of 2 15 = 32768 
bytes, one pack has nine physical blocks. 
[0035] As the audio packet AP contains compressed 
audio signals which shoutd be reproduced at substan- 
tially the same time as the GOR decoding the audio sig- 
nals and reproducing them in synchronism with the vid- 
eo signals require a buffer memory which has a capacity 
to store at least one packet of audio signals plus audio 
signals for the decoding delay of video signals. Because 
the audio signals carry a small amount of data, however, 
the buffer memory can have a small capacity. 
[0036] The following exemplifies a method of making 
the amount of data in a video stream constant in one 
GOP as mentioned above. 

[0037] Fig. 6 presents a schematic block diagram of 
an encoder which accomplishes this method. 
[0038] In Fig. 6, the encoder comprises a frame order 
changing section 11, a motion detector 12, a differenti- 
ator 1 3, a discrete cosine transformer (DCT) 1 4, a quan- 
tizer 1 5, a variable length coder (VLC) 16, a multiplexer 
17, a buffer memory 18, an inverse quantizer 19, an in- 
verse DCT 20, an adder 21 and a frame accumulating 
and predicting section 22. The predicting section 22 de- 
tects the moving vector, and determines the prediction 
mode/The inverse DCT 20, inverse quantizer 19 and 
adder 21 constitute a local decoder. 
[0039] The basic function of this encoder is to perform 
discrete cosine transform (DCT) of an input digital video 
signal by the DCT 14, quantize the transformed coeffi- 
cient by the quantizer 15, encode the quantized value 
by the VLC 16 and output the coded data as a video 
stream via the buffer memory 18. The DCT, quantization 
and coding are carried out in accordance with the de- 
tection of the moving vector, the discrimination of the 
prediction mode, etc., which are accomplished by the 
local decoder, the predicting section 22 and the motion 
detector 12. 

[0040] While the basic structure and function of this 
encoder are described in the specifications of the afore- 
mentioned ISO 11172, the block which makes the 
amount of data in one GOP in the output video stream 
will be discussed in the following description. 
[0041] This block comprises a code amount calculator 
23, a quantization controller 24, a stuffing data genera- 
tor 25, and a timing controller 26. The code amount cal- 
culator 23 attains the amount of stored data occupying 
the buffer memory 18 and calculates the amount of ac- 
cumulated data of video signals, coded at the input sec- 
tion of the buffer memory 18, (amount of codes) from 
the head of the GOP. The quantization controller 24 de- 
termines the quantizer scale for each predetermined 
unit obtained by dividing one frame by a predetermined 
size in accordance with the amount of the stored data 
and the amount of accumulated data, and controls the 
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amount of coded data. The stuffing data generator 25 
generates predetermined stuffing data in accordance 
with the amount of accumulated data. The timing con- 
troller 26 generates timing signals necessary for the in- 

s dividual sections, such as a horizontal sync signal 
Hsync, a frame sync signal FRsync and a GOP sync 
signal GOPsync, based on the input digital video signal. 
The quantizer 15 quantizes the coefficient after DCT, di- 
vides this value by the quantizer scale obtained by the 

10 quantization controller 24, and then outputs the result- 
ant value. The quantizer scale becomes an input to the 
multiplexer 17. The output of the stuffing data generator 
25, which will be discussed later, is also one input to the 
multiplexer 17. 

is [0042] The buffer memory 1 8 functions as illustrated 
in Fig. 7. A variable amount of coded data is generated 
and written in the buffer 18 at times 0, 1T, 2T and so 
forth (T: frame period). In this diagram, the arrows and 
their lengths respectively represent the writing direc- 

20 tions and the amount of data in the memory 18. The data 
is read out from the buffer memory 1 8 at a constant rate. 
This is represented by the inclined, broken lines in the 
diagram. The writing and reading are repeated in the 
illustrated manner. The code amount calculator 23 ob- 

25 tains the amount of data occupying the buffer memory 
18, and the quantization controller 24 alters the quan- 
tizer scale of the quantizer 1 5 based on the amount of 
occupying data in such a way that the buffer memory 18 
does not overflow or underflow, thus controlling the 

30 amount of data to be input to the buffer memory 18. As 
the quantizer scale of the quantizer 1 5 increases, the 
amount of output data therefrom decreases. As the 
quantizer scale decreases, on the other hand, the 
amount of output data from the quantizer 15 increases. 

35 The image quality is however reciprocal to the quantizer 
scale. This control on the amount of codes is also de- 
scribed in the specifications of the ISO 11 172 as a meth- 
od of transferring, at a constant rate, a variable amount 
of coded data generated frame by frame. 

40 [0043] As the amount of data in each GOP is constant 
in this embodiment, the following control is carried out 
in addition to the above-described control on the amount 
of data. 

[0044] The value of the quantizer scale may be deter- 

45 mined as follows. 

[0045] Under the condition to make the amount of da- 
ta in one GOP constant, the quantization controller 24 
calculates the amount of accumulated data from the 
head block of the GOP to the immediately before that 

so block (expected amount of accumulated data), based 
on the amount of data set previously block by block. The 
quantization controller 24 obtains the difference be- 
tween this expected amount of accumulated data and 
the amount of data obtained by the code amount calcu- 

ss lator 23 or the amount of accumulated data actually cod- 
ed and generated from the head block of the GOP to the 
immediately before that block (actual amount of accu- 
mulated data), and determines the value of the quantiz- 
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er scale so that the actual amount of accumulated data 
approaches, but does not exceed, the expected amount 
of accumulated data as close as possible in accordance 
with the positive or negative sign of that difference and 
the absolute value thereof. The top of each GOP is in- 
dicated by the GOP sync signal GOPsync from the tim- 
ing controller 26. 

[0046] The amount of data for each block may be set 
in the following manner. 

(1) The ratio of the amounts of data of I, P and B 
pictures for each frame is determined. 

For example, l:P:B = 15:5:1. 

(2) The amount of data of each frame determined 
by the ratio given in the above process (1) is evenly 
allocated to the individual blocks in one frame. 

[0047] When coding of all the frames of one GOP is 
finished, the actual amount of accumulated data is equal 
to or smaller than the expected amount of accumulated 
data. To completely match the expected amount of ac- 
cumulated data with the amount of data in the video data 
stream in one GOP period, an insufficient amount is 
compensated by stuffing data (e.g., dummy data con- 
sisting of all "0") generated by the stuffing data genera- 
tor 25. 

[0048] In the coding system which conforms to the 
IS0 1 11 72, a bit stream has a plurality of positions where 
a proper amount of stuffing bits having a predetermined 
bit pattern can be inserted, and the bit stream is defined 
so that the presence of stuffing bits and the length there- 
of can be discriminated. For example, a stream of MB 
STUFF (macroblock stuffing) data of a macroblock layer 
or the like is used. Further, the quantizer scale is also 
defined to be inserted in the bit stream when it is trans- 
mitted. For example, a stream of QS (quantizer scale) 
of a slice layer is used. 

[0049] The decoder, which decodes a video data 
stream that includes the stuffing data and quantizer 
scale and has a constant amount of GOP data, detects 
various headers inserted in the input bit stream (such 
as the sequence header, GOP start code, picture start 
code and slice start code), and is synchronized with this 
bit stream. The decoder performs decoding of each 
block in the bit stream by referring to the quantizer scale 
and performs no decoding on stuffing data when detect- 
ed, i.e., the stuffing data is not decoded as video or audio 
signals or other information. In other words, the decoder 
disregards the stuffing data and can thus perform de- 
coding without particularly executing the above-de- 
scribed data amount control to make the amount of data 
in each GOP constant. 

[0050] The essential features of the present invention 
will now be described with reference to Figs. 8 through 
10. 

[0051] Fig. 8 is a diagram showing the format of a 
pack in a transmission method according to one embod- 
iment of the present invention and the time-sequential 



relation between actual video information and audio in- 
formation in that pack. 

[0052] As shown in (a) in Fig. 8, one GOP, i.e. 15 
frames of video information, stored in a video packet VP 

s of such a pack, can be considered as having a time slot 
of 0.5005 sec as one frame has a time slot of 1/29.97 
sec. Thus, the time slot of a pack is set equal to 0.5005 
sec, the time slot of the video packet VP. 
[0053] According to the system that conforms to the 

io ISO 11172, 1152 samples of two (R and L) channels of 
audio signals as shown in (b) and (c) in Fig. 8 are treated 
as one AAU (Audio Access Unit), and are compressed 
to data having a fixed length for each AAU. Given that 
the sampling frequency of audio signals is 48 KHz, 1 1 52 

15 samples have a time slot of 24 msec which is the time 
slot of a single AAU as shown in (d) in this diagram. 
Therefore, the number of AAUs to be stored in one pack 
having a time slot of 0.5005 sec is equivalent to 
20.854-. 

20 [0054] Since decoding of audio signals is performed 
for each AAU, the number of AAUs in one pack is "21 " 
or "20" if that number is an integer. If twenty-one AAUs 
are to be stored in one pack, audio information lags by 
3.5 msec from video information per one pack, and if 

2S twenty AAUs are to be stored in one pack, audio infor- 
mation leads 20.5 msec from video information per one 
pack, as shown in (b), (c) and (d) in Fig. 8. Accordingly, 
forty-one packs each containing 21 AAUs and seven 
packs each containing 20 AAUs, mounting to forty-eight 

30 packs having just 1 001 AAUs, coincide with the time slot 
of 48 packs 24.024 sec. The relative time difference be- 
tween a video packet VP and an audio packet AP in one 
pack therefore differs from one pack to another and re- 
turns to the original one in a period of 48 packs. (Here- 

35 inafter, this period will be called "48-pack period.") 
[0055] To absorb this time difference, an AAU se- 
quence number is inserted in a data packet DP in each 
pack as information indicating the location of that pack 
in a stream of 48 packs which form one period. 

40 [0056] Fig. 9 exemplifies the relation among this AAU 
sequence number, the number of AAUs in one pack, and 
the difference between presentation start times for video 
signals and audio signals in a reproducing apparatus. 
[0057] It is apparent from Fig. 9 that AAU sequence 

45 numbers "0" to a 47 a are given to identify packs num- 
bered from T to °48 B in one 48-pack period, and AAU 
sequence numbers "0" to "47" are also given to identify 
subsequent packs in the next 48-pack period starting 
from the one with a pack number "49". It is also apparent 

so that a pack having 20 AAUs is formed in nearly a 7-pack 
period and every time this pack is input to the reproduc- 
ing apparatus, the difference between the presentation 
start times for video signals and audio signals becomes 
smaller. At the last pack in the 48-pack period having an 

55 AAU sequence number "47", this difference becomes 
zero. 

[0058] A description will now be given of the repro- 
ducing apparatus that reproduces video and audio sig- 
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nals and data, which are transmitted in such a stream 
of packs. 

[0059] Fig. 10 is a block diagram of this reproducing 
apparatus. 

[0060] In Fig. 1 0, a demultiplexer 31 separates an in- 
put stream of packs into a stream of video packets, a 
stream of audio packets and a stream of data packets. 
The separated video packet VP is sent to a video de- 
coder 32, the separated audio packet AP is sent to an 
FIFO (FirsMn First -Out) memory 33, and the separated 
data packet DP is sent to a delay controller 34. The data 
packet DP is also output directly as a data output. The 
video decoder 32 decodes compressed video signals 
and outputs the decoded video signals to a D/A convert- 
er 35. The D/A converter 35 converts the decoded video 
signals to analog video signals and outputs them as a 
video signal output. The FIFO memory 33 functions as 
a variable delay circuit whose delay amount is controlled 
by the delay controller 34, and outputs delayed audio 
packet data to an audio decoder 36. The audio decoder 
36 decodes the compressed audio signals and sends 
the resultant signals to a D/A converter 37. The D/A con- 
verter 37 converts the decoded audio signals to analog 
audio signals and outputs them as an audio signal out- 
put. 

[0061] The delay controller 34 refers to the AAU se- 
quence number in the data packet DP to acquire the dif- 
ference between presentation start times for the video 
output and the audio output, and controls the delay 
amount of the FIFO memory 33 so that this delay 
amount matches with the acquired time difference. Ac- 
cordingly, the synchronous reproduction of the video 
signals and audio signals is achieved. 
[0062] As described above, the difference between 
the presentation start times for the video signals and au- 
dio signals included in a pack cyclically varies in a period 
of a certain number of packs, each pack includes infor- 
mation on the number of that pack in a plurality of packs 
that constitutes one period, the difference between the 
presentation start times for the video signals and audio 
signals in each pack is acquired by referring to this in- 
formation on the pack number, and the decoding timing 
is controlled based on this time difference. The synchro- 
nous reproduction of the video signals and audio signals 
can therefore be accomplished easily. 
[0063] As synchronous reproduction requires just 48 
delay patterns for the FIFO memory 33, the circuit struc- 
ture becomes simple. Another advantage is that the 
buffer (FIFO memory 33 in this case) for controlling the 
timing of audio signals can have a small capacity. To 
match the time axis of video signals with the time axis 
of audio signals, it is possible to control the time axis of 
video signals alone or control both time axes of video 
and audio signals. 

[0064] The FIFO memory 33 as a variable delay cir- 
cuit may be positioned at the subsequent stage of the 
audio decoder 36 or the buffer memory in the audio de- 
coder 36 may also be used as a variable delay circuit. 
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[0065] Although one pack contains single data, audio 
and video packets in the above-described embodiment, 
one pack may contain a plurality of packets for each 
type. Although two channels of audio signals are proc- 

5 essed, four channels of audio signals may be dealt with. 
In the case of four channels, audio signals should be 
compressed, AAU by AAU, by the same time, two chan- 
nels at a time, before the audio signals are stored in a 
packet. The same applies to the case of multichannel of 

10 audio signals. Although the foregoing description of the 
embodiment has been given with reference to the case 
where video signals are compressed before transmis- 
sion, the present invention is also applicable to the case 
where video signals are transmitted uncompressed. 

is [0066] Further, although the embodiment has been 
explained as the system which also has a function of 
recording a stream of packs on a recording medium, the 
advantages of the present invention can be expected as 
long as the system, even without the recording function, 

20 has a function of transmitting time-divided video and au- 
dio signals. 

[0067] In short, according to the method of transmit- 
ting time-divided video and audio signals, the number 
of unit audio data blocks to be put in one pack is set in 

25 such a way that the difference between the presentation 
start times for the stream of video data and the stream 
of audio data in one pack in a predetermined pack period 
becomes a predetermined value, and the pack carries 
positional information of the pack in the predetermined 

30 pack period to the pack. In addition, according to the 
method of reproducing time-divided video and audio sig- 
nals, the difference between presentation start times for 
video signals and audio signals in each pack is acquired 
by referring to positional information in a stream of 

35 packs, transferred by the above transmission method, 
and at least one of the presentation start times for video 
signals and audio signals in the stream of packs is con- 
trolled so that the difference between the presentation 
start times coincides with the difference between the 

40 presentation start times corresponding to the positional 
information. 

[0068] With the above design, it is possible to provide 
a synchronizing system with a simple structure, which 
can accomplish synchronous reproduction without com- 
45 plicating the control circuit for synchronizing video and 
audio signals with each other. 



Claims 

50 

1 . A method of transmitting time-divided video and au- 
dio signals, comprising the steps of: 

coding a predetermined time slot (GOP) of vid- 
55 eo signals to form a stream of video data; 

coding a predetermined number of samples of 
audio signals to form a unit audio data block 
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and forming a stream of audio data consisting 
of an integer number of unit audio data blocks 
whose aggregate time slot approximately cor- 
responds to said predetermined time slot; 

performing time-division multiplexing on said 
stream of video data (GOP) and said stream of 
audio data, storing resultant data in a pack (VP, 
AP) having said predetermined time slot (GOP) 
and transferring the video signals and audio 
signals in pack periods each comprising a 
stream of a predetermined number of said 
packs (VP,AP); 

wherein the total time slot of the video signals 
in a pack (VP.AP) period is equal to the total time 
slot of the audio signals in the pack period, and the 
number of audio data blocks in the packs of a period 
is varied cyclically so that the presentational start 
times for the stream of video data and the stream 
of audio data in a pack in the pack period depends 
on the position of the pack in the pack period and 
is constant for a pack in that position from one pack 
period to another, and affixing to the pack positional 
information giving the position of the pack in the 
pack period. 

2. The method according to claim 1 , wherein said pre- 
determined pack period is a period of 48 packs. 

3. The method according to claim 1, wherein said po- 
sitional information indicates the number of said 
pack in said predetermined pack period. 

4. The method according to claim 2, wherein said po- 
sitional information indicates the number of said 
pack in said predetermined pack period. 

5. A method of reproducing time-divided video and au- 
dio signals transmitted by the method of claim 1 , 
comprising a step of referring to positional informa- 
tion of a pack in a pack period, controlling at least 
one of the presentation start times for the video sig- 
nals and the audio signals for each pack in the pack 
period so that the difference between the presenta- 
tion start times for the video signals and audio sig- 
nals corresponds to the difference between presen- 
tation start times denoted by the position of the pack 
in the pack period. 

6. The method according to claim 5, wherein said pre- 
determined pack period is a period of 48 packs. 

7. The method according to claim 5, wherein said po- 
sitional information indicates the number of said 
pack in said predetermined pack period. 

8. The method according to claim 6, wherein said po- 
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sitional information indicates the number of said 
pack in said predetermined pack period. 



s Patentanspruche 

1. Verfahren zum Ubertragen zeitgeteilter Video- und 
Audiosignale mit folgenden Schritten: 

10 Codieren eines vorbestimmten Zeitschlitzes 

(GOP - "Group Of Pictures") an Vldeosignalen 
zum Bilden eines Stroms von Videodaten; 

Codieren einer vorbestimmten Anzahl an Abta- 
15 stungen von Audiosignalen zum Bilden eines 

Einheits-Audiodatenblocks und Bilden eines 
Stroms von Audiodaten, bestehend aus einer 
naturlichen Anzahl von Einheits-Audiodaten- 
blocken. deren zusammengesetzterZeitschlitz 
20 ungefahr dem vorbestimmten Zeitschlitz ent- 

spricht; 

Durchfuhren einer Zeitteilungs-Multiplexierung 
an dem Strom von Videodaten (GOP) und dem 

25 Strom von Audiodaten, Speichem der resultie- 

renden Daten in einem Paket (VP, AP) mit dem 
vorbestimmten Zeitschlitz (GOP) und Transfe- 
rieren der Videosignale und Audiosignale in 
Paketperioden mit jeweils einem Strom einer 

30 vorbestimmten Anzahl der Pakete (VP, AP); 

wobei der gesamte Zeitschlitz der Videosi- 
gnale in einer Periode eines Pakets (VP, AP) gleich 
ist dem gesamten Zeitschlitz der Audiosignale in 

35 der Paketperiode, und die Anzahl von Audiodaten - 
blocken in den Paketen einer Periode zyklisch va- 
riiert wird, so daB die Prasentationsstartzeiten fur 
den Strom an Videodaten und den Strom an Au- 
diodaten in einem Paket in der Paketperiode ab- 

40 hangt von der Position des Paketes in der Paketpe- 
riode und konstant ist fur ein Paket in dieser Posi- 
tion von einem Paket zu einem weiteren, und An- 
bringen von Positionsinformationen an das Paket, 
welche die Position des Pakets in der Paketperiode 

45 lief em. 

2. Verfahren nach Anspruch 1 , wobei die vorbestimm- 
te Paketperiode eine Periode von 48 Paketen ist. 

50 3. Verfahren nach Anspruch 1, wobei die Positionsin- 
formationen die Anzahl von Paketen in der vorbe- 
stimmten Paketperiode anzeigen. 

4. Verfahren nach Anspruch 2, wobei die Positionsin- 
55 formationen die Anzahl von Paketen in der vorbe- 
stimmten Paketperiode anzeigen. 

5. Verfahren zum Reproduzieren zeitgeteilter Video- 
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und Audiosignale, weiche durch das Verfahren 
nach Anspruch 1 ubertragen werden, mit einem 
Schritt der Bezugnahme auf Positions informatio- 
nen eines Pakets in einer Paketperiode, Steuem 
von zumindest einer der Prasentationsstartzeiten s 
fur die vldeosignale und die Audiosignale fur jedes 
Paket in der Paketperiode derart, daB die Differenz 
zwischen den Prasentationsstartzeiten fur die Vi- 
deosignale und Audiosignale der Differenz zwi- 
schen den Prasentationsstartzeiten entspricht, we I- 10 
che durch die Position des Pakets in der Paketpe- 
riode bezeichnet wird. 

6. Verfahren nach Anspruch 5, wobei die vorb est infini- 
te Paketperiode eine Periode von 48 Paketen ist. is 

7. Verfahren nach Anspruch 5, wobei die Positionsin- 
formationen die Anzahl der Pakete in der vorbe- 
stimmten Paketperiode anzeigen. 

20 

8. Verfahren nach Anspruch 6, wobei die Positionsin- 
formationen die Anzahl von Paketen in der vorbe- 
stimmten Paketperiode anzeigen. 

25 

Revendlcations 

1 . Procede de transmission de signaux video et audio 
multiplexes par repartition dans le temps, compre- 
nant les etapes qui consistent a : 30 

coder une tranche de temps pr6d6termin6e 
(GOP) des signaux video pour former un flux 
de donnees video ; 

35 

coder un nombre predetermine d'echantillons 
de signaux audio pour former un bloc de don- 
nees audio unitaires, et former un flux de don- 
nees audio constitue d'un nombre entier de 
blocs de donn6es audio unitaires dont I'agregat 40 
des tranches de temps correspond a peu pres 
a ladite tranche de temps pr6d6tenmin6e ; 

effectuer un multiplexage par repartition dans 
le temps sur ledit flux de donnees video (GOP) 45 
et sur ledit flux de donn6es audio, stocker les 
donnees obtenues dans un paquet (VP, AP) 
ayant ladite tranche de temps pred6termin6e 
(GOP) et transferer les signaux video et les si- 
gnaux audio dans des cycles de paquets com- so 
prenant chacun un flux constitue d'un nombre 
predetermine desdits paquets (VP, AP) ; 

la tranche de temps totale des signaux video 
dans un cycle de paquets (VP, AP) etant egale 55 
a la tranche de temps totale des signaux audio 
dans le cycle de paquets, et le nombre de blocs 
de donnees audio dans les paquets d'un cycle 



variant de maniere cyclique de telle sorte que 
les instants de d6marrage de presentation pour 
le flux de donnees video et le flux de donnees 
audio dans un paquet du cycle de paquets de- 
pendent de la position du paquet dans le cycle 
de paquets et soient constants pour un paquet 
se trouvant a cette position d'un cycle de pa- 
quets a Pautre, et attacher au paquet du cycle 
de paquets les informations de position de pa- 
quet donnant la position du paquet. 

2. Proc6d6 selon la revendication 1 , dans lequel le cy- 
cle de paquets predetermine est un cycle de 48 pa- 
quets. 

3. Precede selon la revendication 1 , dans lequel lcs- 
dites informations de position indiquent le nombre 
desdits paquets dans ledit cycle de paquets prede- 
termine. 

4. Precede selon la revendication 2, dans lequel les- 
dites informations de position indiquent le nombre 
desdits paquets dans ledit cycle de paquets prede- 
termine. 

5. Precede de reproduction de signaux video et audio 
multiplexes par repartition dans le temps, transmis 
par le precede de la revendication 1 , comprenant 
une etape qui consiste prendre comma reference 
les informations de position d'un paquet dans un cy- 
cle de paquets, commander au moins I'un des ins- 
tants de demarrage de presentation pour les si- 
gnaux video et les signaux audio pour chaque pa- 
quet du cycle de paquets, de telle sorte que la dif- 
ference entre les instants de demarrage de presen- 
tation des signaux video et des signaux audio cor- 
responde a la difference entre les instants de de- 
marrage de presentation indiquee par la position du 
paquet dans le cycle de paquets. 

6. Precede selon la revendication 5, dans lequel le cy- 
cle de paquets predetermine est un cycle de 48 pa- 
quets. 

7. Precede selon la revendication 5, dans lequel les- 
dites informations de position indiquent le nombre 
desdits paquets dans ledit cycle de paquets prede- 
termine. 

8. Precede selon la revendication 6, dans lequel les- 
dites informations de position indiquent le nombre 
desdits paquets dans ledit cycle de paquets prede- 
termine. 
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